Signal change apparatus, method, and program

ABSTRACT

A signal output unit outputs an acquired signal. A signal attribute value display unit displays a value of an attribute related to an element constituting a target represented by the acquired signal or a signal generation source in a state in which a change instruction of the value of the attribute is able to be received. A changed attribute value acquisition unit acquires a changed value of the attribute when the change instruction of the value of the attribute is received. A change unit changes the value of the attribute for which the change instruction has been received on the basis of the changed value of the attribute acquired by the changed attribute value acquisition unit. A changed signal output unit outputs a changed signal in which the value of the attribute has been changed.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a 371 U.S. National Phase of InternationalApplication No. PCT/JP2018/017404, filed on May 1, 2018, which claimspriority to Japanese Application No. 2017-091733, filed May 2, 2017. Theentire disclosures of the above applications are incorporated herein byreference.

TECHNICAL FIELD

The present invention relates to a signal change apparatus, method, andprogram for changing a signal.

BACKGROUND ART

Conventionally, a method for directly editing an image is known(Non-Patent Document 1).

Moreover, a method for representing attributes of an image as aone-dimensional vector (cVAE: conditional variational auto-encoder) andediting the image is also known (Non-Patent Document 2).

Moreover, a method for calculating an attribute vector as follows andattaching the calculated attribute vector to a target image is known(Non-Patent Document 3).(Attribute vector)=(Average of latent variables of image including givenattribute)−(Average of latent variables of image that does not includegiven attribute)

It is to be noted that a latent variable is like an essence useful forrepresenting an image.

PRIOR ART DOCUMENTS Non-Patent Documents

-   Non-Patent Document 1: L. Liu, et al., “Wow! You Are So Beautiful    Today!”, ACMMM 2013.-   Non-Patent Document 2: X. Yan, et al., “Attribute2Image: Conditional    Image Generation from Visual Attributes”, arXiv 2015.-   Non-Patent Document 3: A. B. L. Larsen, et al., “Autoencoding beyond    pixels using a learned similarity metric”, ICML 2016.

SUMMARY OF INVENTION Problems to be Solved by the Invention

With the method described in the Non-Patent Document 1, it is difficultto cope with various types of input data because of strong constraintssuch as a face facing in front, short hair/bundled hair, and thinmakeup/no makeup.

Moreover, with the method described in the Non-Patent Document 2,expressiveness is insufficient because the attributes areone-dimensionally represented.

Moreover, with the method described in the Non-Patent Document 3, it isdifficult to edit the image because identity and an attribute within thelatent variables extracted from the image are not separated.

The present invention has been made in consideration of theabove-described circumstances and an object of the present invention isto provide a signal change apparatus, method, and program capable ofappropriately changing a signal such as an image.

Means for Solving the Problems

In order to achieve the above-described object, a signal changeapparatus according to a first aspect of the present invention includes:a signal output unit that outputs an acquired signal; a signal attributevalue display unit that displays a value of an attribute related to anelement constituting a target represented by the acquired signal or asignal generation source in a state in which a change instruction of thevalue of the attribute is able to be received; a changed attribute valueacquisition unit that acquires a changed value of the attribute when thechange instruction of the value of the attribute is received; a changeunit that changes the value of the attribute for which the changeinstruction has been received on the basis of the changed value of theattribute acquired by the changed attribute value acquisition unit; anda changed signal output unit that outputs a changed signal in which thevalue of the attribute has been changed.

A signal change method according to a second aspect of the presentinvention includes: outputting, by a signal output unit, an acquiredsignal; displaying, by a signal attribute value display unit, a value ofan attribute related to an element constituting a target represented bythe acquired signal or a signal generation source in a state in which achange instruction of the value of the attribute is able be received;acquiring, by a changed attribute value acquisition unit, a changedvalue of the attribute when the change instruction of the value of theattribute is received; changing, by a change unit, the value of theattribute for which the change instruction has been received on thebasis of the changed value of the attribute acquired by the changedattribute value acquisition unit; and outputting, by a changed signaloutput unit, a changed signal in which the value of the attribute hasbeen changed.

According to the first and second aspects, a value of an attributerelated to an element constituting a target represented by an acquiredsignal or a signal generation source is displayed in a state in which aninstruction for changing the value of the attribute is able to bereceived, the value of the attribute for which the change instructionhas been received is changed on the basis of the changed value of theattribute when the instruction for changing the value of the attributeis received, and a changed signal in which the value of the attributehas been changed is output. Thereby, it is possible to appropriatelychange a signal.

In the signal change apparatus according to the first aspect, each ofthe acquired signal and the changed signal may be an image, and theattribute may be an attribute related to an element constituting asubject representing the image.

In the signal change apparatus according to the first aspect, the signalattribute value display unit may display the value of the attribute bymeans of a controller indicating the value of the attribute in the statein which the change instruction of the value of the attribute is able tobe received.

A signal change apparatus according to a third aspect of the presentinvention includes: a variable extraction unit that extracts a latentvariable of an input signal; a change unit that changes a value of thelatent variable extracted by the variable extraction unit; and a signalgeneration unit that generates a signal from the latent variable changedby the change unit.

A signal change method according to a fourth aspect of the presentinvention includes: extracting, by a variable extraction unit, a latentvariable of an input signal; changing, by a change unit, a value of thelatent variable extracted by the variable extraction unit; andgenerating, by a signal generation unit, a signal from the latentvariable changed by the change unit.

According to the third and fourth aspects, a latent variable of an inputsignal is extracted, a value of the extracted latent variable ischanged, and a signal is generated from the changed latent variable.Thereby, it is possible to appropriately change a signal.

In the signal change apparatus according to the third aspect, thevariable extraction unit may extract latent variables from the inputsignal using a pre-learned first neural network, the extracted latentvariables including a plurality of latent variables representing anattribute, the change unit may change a conversion result of a changetarget within conversion results obtained by converting the plurality oflatent variables representing the attribute or a latent variable of achange target within latent variables other than the plurality of latentvariables representing the attribute among the extracted latentvariables using a value of an attribute vector representing theattribute in the input signal, and the signal generation unit maygenerate the signal from the conversion result of the change target or aconversion result obtained by the change unit changing the latentvariable of the change target and the latent variables other than theplurality of latent variables using a pre-learned second neural network.

A signal change apparatus according to a fifth aspect of the presentinvention includes: a variable extraction unit that extracts a latentvariable representing an attribute of an input signal; a change unitthat changes a value of the latent variable extracted by the variableextraction unit by replacing the value of the extracted latent variablewith a value of a latent variable representing an attribute extractedfrom a signal of a transfer source; and a signal generation unit thatgenerates a signal from the latent variable changed by the change unit.

A signal change method according to a sixth aspect of the presentinvention includes: extracting, by a variable extraction unit, a latentvariable representing an attribute of an input signal; changing, by achange unit, a value of the latent variable extracted by the variableextraction unit by replacing the value of the extracted latent variablewith a value of a latent variable representing an attribute extractedfrom a signal of a transfer source; and generating, by a signalgeneration unit, a signal from the latent variable changed by the changeunit.

According to the fifth and sixth aspects, a latent variable representingan attribute of an input signal is extracted, a value of the extractedlatent variable is changed by replacing the value of the extractedlatent variable with a value of a latent variable representing anattribute extracted from a signal of a transfer source, and a signal isgenerated from the changed latent variable. Thereby, it is possible toappropriately change a signal.

A program according to a seventh aspect of the present invention causesa computer to function as each unit constituting the signal changeapparatus.

Advantageous Effects of Invention

According to a signal change apparatus, method, and program of thepresent invention, an advantageous effect that it is possible toappropriately change a signal can be obtained.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is an imaginary diagram of an attribute change screen in a firstembodiment of the present invention.

FIG. 2 is a conceptual diagram of an encoder in the first embodiment ofthe present invention.

FIG. 3 is a conceptual diagram of a generator in the first embodiment ofthe present invention.

FIG. 4 is a diagram explaining a method for learning a generator and adiscriminator.

FIG. 5 is a block diagram showing a configuration of a signal changeapparatus according to first and second embodiments of the presentinvention.

FIG. 6 is a flowchart showing a learning process routine in the signalchange apparatus according to the first and second embodiments of thepresent invention.

FIG. 7 is a flowchart showing a generation process routine in the signalchange apparatus according to the first and second embodiments of thepresent invention.

FIG. 8 is a conceptual diagram of a generator in the second embodimentof the present invention.

FIG. 9 is a conceptual diagram of generators, discriminators, andapproximation distributions in the second embodiment of the presentinvention.

FIG. 10 is an imaginary diagram of an attribute change screen forchanging an attribute of an audio signal.

FIG. 11 is an imaginary diagram of an attribute change screen forchanging an attribute of text data.

FIG. 12 is an imaginary diagram of an attribute change screen forchanging an attribute of moving-image data.

FIG. 13 is a conceptual diagram of a generator, a discriminator, andapproximation distributions in the second embodiment of the presentinvention.

MODES FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described indetail with reference to the drawings.

Overview of First Embodiment of Present Invention

First, an overview of the first embodiment of the present invention willbe described.

In the first embodiment, as shown in FIG. 1, it is possible to freelycontrol an attribute by moving a slide bar 96 corresponding to the valueof the attribute or clicking a radio button 94 corresponding to thevalue of the attribute and change an image as in conventional imageediting software.

Here, various data exists in society. For example, there are imagescorresponding to various face orientations, images corresponding tovarious illumination conditions, images corresponding to various ages,and images corresponding to various facial expressions.

Therefore, in the present embodiment, latent variables (like an essenceuseful for representing an image) are extracted using a neural network Eas shown in FIG. 2 and the value of a latent variable is changed, ratherthan directly editing an image.

Moreover, diversity is present even in one attribute. For example, even“bangs” have a variety of shapes.

Therefore, in the present embodiment, for each of attributes, eachattribute is represented by a plurality of latent variables, as shown inFIG. 3. Specifically, an attribute can be controlled easily byseparating the plurality of latent variables into latent variablesrepresenting identity and latent variables representing the attribute.Sufficient expressiveness can be obtained by representing each attributeusing a plurality of latent variables on an attribute-by-attributebasis. When an attribute is represented by a discrete value, a latentvariable may be represented by continuous values which can be any valuesin a section defined by discrete values that can be taken by theattribute. A generator 2 obtains sufficient expressiveness byrepresenting each attribute with a latent variable having continuousvalues. Likewise, when an attribute is represented by a givendistribution, a latent variable may be represented by a more detaileddistribution than the given distribution. The generator 2 obtainssufficient expressiveness by representing each attribute using a latentvariable that follows the more detailed distribution. It is to be notedthat among a plurality of latent variables, a latent variable (a latentvariable z_(a) in FIG. 3) that is constrained by an attribute vector yto be described below is a latent variable representing an attribute.Moreover, among a plurality of latent variables, a latent variable (alatent variable z_(i) in FIG. 3) that is not constrained by theattribute vector y is a latent variable representing identity.

Moreover, when an attempt is made to honestly learn the structures of anencoder 1 (FIG. 2) and the generator 2 (FIG. 3), learning is performedso that the error between true data and data generated via the encoder 1and the generator 2 is small. At this time, definite constraints cannotbe given with respect to what is represented by each of the latentvariable and the attribute vector y.

Therefore, in the present embodiment, as shown in FIG. 4, conditionalfiltered generative adversarial networks (CFGANs) are learned togetherwhen the generator 2 is learned. At this time, the CFGAN is constrainedso that an image generated on the basis of a latent variable generatedfrom a given data distribution includes a given latent variable or doesnot include the given attribute depending on the attribute vector y.Moreover, a discriminator 3 discriminates whether or not the generatedimage follows the same distribution as a true image under the presenceor absence or positive or negative of each attribute represented by theattribute vector y. That is, the discriminator 3 discriminates whetheror not the generated image is a true image. It is to be noted that thepositive or negative of an attribute is, for example, “male/female”, aswill be described below. Thereby, various latent variables z_(i) andz_(a) can be constrained so as to represent identity and an attribute,respectively.

Moreover, in the present embodiment, when an attribute of an image is tobe changed, the attribute is changed while the identity is maintained.

<Configuration of Signal Change Apparatus According to First Embodimentof Present Invention>

Next, a configuration of a signal change apparatus according to thefirst embodiment of the present invention will be described. As shown inFIG. 5, the signal change apparatus 100 according to the firstembodiment of the present invention can be configured by a computerincluding a central processing unit (CPU), a random access memory (RAM),and a read only memory (ROM) that stores programs for executing alearning process routine and a generation process routine to bedescribed below and various types of data. As shown in FIG. 5, thesignal change apparatus 100 functionally includes an input unit 10, anarithmetic unit 20, and an output unit 90.

The input unit 10 receives a plurality of pairs of image data x and anattribute vector y as learning data. Moreover, the input unit 10receives image data x that is a change target.

The arithmetic unit 20 includes a learning unit 30, a neural networkstorage unit 40, a prediction unit 50, a variable extraction unit 52, asignal output unit 53, a signal attribute value display unit 54, achanged attribute value acquisition unit 56, a prediction unit 58, avariable extraction unit 60, a change unit 62, a signal generation unit64, and a changed signal output unit 66.

The learning unit 30 learns these neural networks so that a neuralnetwork G serving as the generator 2 and a neural network D serving asthe discriminator 3 follow optimization conditions that contend witheach other on the basis of the input learning data. The neural network Gserving as the generator 2 receives a generated latent variable z_(i)representing identity and a generated latent variable z_(a)′representing each attribute and generates image data from the latentvariable z_(i) representing identity and the latent variable z_(a)′representing each attribute. The neural network D serving as thediscriminator 3 discriminates whether or not the generated image datafollows the same distribution as true image data x under the attributevector y representing each attribute of the image data. That is, theneural network D serving as the discriminator 3 discriminates whether ornot the generated image data is the true image data x. For example, theattribute vector y represents the presence or absence of an attribute orpositive or negative of an attribute. However, the attribute vector y isnot particularly limited thereto. The latent variable z_(a)′representing each attribute that becomes an input to the neural networkG serving as the generator 2 is obtained by converting the latentvariable z_(a) representing each attribute using the value of theattribute vector y. As an example of conversion, it is conceivable thatthe generated latent variable z_(a) representing each attribute ismultiplied by the attribute vector y as shown in the following formulawhen the attribute vector y represents the presence or absence of anattribute (i.e., y=1 when an attribute is present and y=0 when theattribute is absent).

$\begin{matrix}\lbrack {{Expression}\mspace{14mu} 1} \rbrack & \; \\{z_{a}^{\prime} = \{ \begin{matrix}z_{a} & ( {y = 1} ) \\0 & ( {y = 0} )\end{matrix} } & (1)\end{matrix}$

Alternatively, it is conceivable that a positive value (|z_(a)|) (wheny=1) or a negative value (−|z_(a)|) (when y=0) is assigned to thegenerated latent variable z_(a) representing each attribute inaccordance with the attribute vector y as shown in the following formulawhen the attribute vector y represents positive or negative of anattribute.

$\begin{matrix}\lbrack {{Expression}\mspace{14mu} 2} \rbrack & \; \\{z_{a}^{\prime} = \{ \begin{matrix}{z_{a}} & ( {y = 1} ) \\{- {z_{a}}} & ( {y = 0} )\end{matrix} } & (2)\end{matrix}$

Specifically, the learning unit 30 receives image data x and theattribute vector y of the input learning data and the latent variablez_(i) representing identity and the latent variable z_(a) representingeach attribute generated from a given data distribution. Here, when theimage data x is face image data, the attribute vector y represents thepresence or absence of each of “glasses”, “makeup”, “beard”, and “bangs”and the distinction between “male/female”, “not smile/smile”, and“old/young”, and the latent variable z_(a) representing an attributerepresents the diversity within each attribute (e.g., representing “Whattype of glasses are they?”). The learning unit 30 may generate thelatent variable z_(i) representing identity and the latent variablez_(a) representing each attribute using random numbers.

Moreover, when the latent variable z_(a) representing an attribute isgenerated from a given data distribution, the learning unit 30 generatesthe latent variable z_(a) representing the attribute in accordance withthe following formula, if, for example, the latent variable z_(a)representing the attribute is discrete.

$\begin{matrix}\lbrack {{Expression}\mspace{14mu} 3} \rbrack & \; \\{ z_{a} \sim{{Cat}( {{K = k},{p = \frac{1}{k}}} )}} & (3)\end{matrix}$

where k represents the number of categories (the number of discretevalues). Moreover, Cat represents a distribution having valuesindicating categories equal in number to the number of categories K, andp represents a probability.

Moreover, when the latent variable z_(a) representing the attribute iscontinuous, the learning unit 30 generates the latent variable z_(a)representing the attribute in accordance with the following formula.

[Expression 4]z _(a)˜Unif(−1,1)  (4)

where Unif(−1, 1) is a uniform distribution in which the range of valuesis from −1 to 1.

It is to be noted that a latent variable z_(a) that follows anotherdistribution and conversion can be adopted. For example, it is possibleto use a normal distribution instead of the uniform distribution (Unif(−1, 1)) as the distribution of the latent variable z_(a) and it is alsopossible to change the range of values.

Moreover, the learning unit 30 receives the generated latent variablez_(i) representing identity and the generated latent variable z_(a)′representing each attribute and generates image data using the neuralnetwork G serving as the generator 2. At this time, the latent variablez_(a)′ representing each attribute that is an input of the neuralnetwork G serving as the generator 2 is obtained by converting thelatent variable z_(a) representing each attribute using the value of theattribute vector y.

Then, the learning unit 30 updates a parameter of the neural network Gserving as the generator 2 so as to satisfy a constraint that thediscriminator 3 discriminates that the generated image data follows thesame distribution as the true image data under the attribute vector y asmuch as possible. That is, the parameter of the neural network G servingas the generator 2 is updated so that the discriminator 3 discriminatesthe generated image as the true image data.

Moreover, the learning unit 30 updates a parameter of the neural networkD serving as the discriminator 3 so as to satisfy a constraint that thediscriminator 3 discriminates that the generated image data does notfollow the same distribution as the true image data under the attributevector y as much as possible and so as to satisfy a constraint that thediscriminator 3 discriminates that the true image data x follows thesame distribution as the true image data.

It is to be noted that the optimization conditions that the neuralnetwork G serving as the generator 2 and the neural network D serving asthe discriminator 3 contend with each other are represented by thefollowing formula.

$\begin{matrix}{\mspace{20mu}\lbrack {{Expression}\mspace{14mu} 5} \rbrack} & \; \\{{{\min\limits_{G}{\max\limits_{D}{{\mathbb{E}}_{x,{y\sim{P_{data}{({x,y})}}}}\lbrack {\log\;{D( {x,y} )}} \rbrack}}} + {{\mathbb{E}}_{{z_{i}\sim{P_{z_{i}}{(z_{i})}}},{z_{a}\sim P_{z_{a}}\sim{P_{y}{(y)}}}}\lbrack {\log( {1 - {D( {{G( {z_{i},z_{a},y} )},y} )}} )} \rbrack}}\mspace{20mu}{where}} & (5) \\{\mspace{20mu}\lbrack {{Expression}\mspace{14mu} 6} \rbrack} & \; \\{\mspace{20mu}{x,{ y \sim{P_{data}( {x,y} )}}}} & (6)\end{matrix}$

represents that the true image data x and the attribute vector y aresampled from the learning data.

Moreover,z _(i) ˜P _(z) _(i) (z _(i))  [Expression 7]

represents that a latent variable z_(i) representing identity isgenerated from a given data distribution. For example, the latentvariable z_(i) representing identity is generated by a random number.

Moreover,z _(a) ˜P _(z) _(a) (z _(a))  [Expression 8]

represents that a latent variable z_(a) representing an attribute isgenerated from a given data distribution. For example, the latentvariable z_(a) representing the attribute is generated by a randomnumber.

Moreover,y˜P _(y)(y)  [Expression 9]

represents that the attribute vector y is sampled from the learningdata.

Moreover, E represents an expected value.

The learning unit 30 performs the above-described process for each pieceof the learning data and iteratively updates the parameter of the neuralnetwork G serving as the generator 2 and the parameter of the neuralnetwork D serving as the discriminator 3.

The neural network G serving as the generator 2 and the neural network Dserving as the discriminator 3 which are finally obtained are stored inthe neural network storage unit 40.

Next, as shown in FIG. 2, the learning unit 30 receives image data xincluded in the input learning data and extracts a latent variable z_(i)representing identity and a latent variable z_(a) representing eachattribute using the neural network E serving as the encoder 1.

Moreover, as shown in FIG. 3, the learning unit 30 receives theextracted latent variable z_(i) representing identity and the latentvariable z_(a)′ representing each attribute and generates image dataG(z_(i), z_(a), y) using the neural network G serving as the generator2. At this time, the latent variable z_(a)′ representing each attributeis obtained by converting the latent variable z_(a) representing eachattribute output by the neural network E serving as the encoder 1 usingthe value of the attribute vector y. It is to be noted that in FIG. 3,f_(y) is a filter function used for conversion. As an example of theconversion, it is conceivable that the latent variable z_(a)representing each attribute output by the neural network E serving asthe encoder 1 is multiplied by the attribute vector y.

Moreover, the learning unit 30 updates the parameter of the neuralnetwork E serving as the encoder 1 so as to satisfy a constraint thatthe generated image data is the same as the original image data x.

The learning unit 30 performs the above-described process for each pieceof the learning data and iteratively updates the parameter of the neuralnetwork E serving as the encoder 1.

The neural network E serving as the encoder 1 which is finally obtainedis stored in the neural network storage unit 40.

The prediction unit 50 inputs image data x of the change target receivedby the input unit 10 to a pre-learned neural network (e.g.,convolutional neural networks (CNNs)) serving as a predictor (not shown)for predicting the attribute vector y and predicts the attribute vectory.

The neural network serving as the predictor outputs the attribute vectory. The attribute vector y is, for example, a classification of thepresence or absence of each attribute or positive or negative of eachattribute. However, the attribute vector y is not particularly limitedthereto.

The variable extraction unit 52 receives the input image data x of thechange target and extracts a latent variable z_(i) representing identityof the image data x of the change target and a latent variable z_(a)representing each attribute of the image data x of the change targetusing the neural network E serving as the encoder 1 stored in the neuralnetwork storage unit 40. Moreover, the variable extraction unit 52obtains a latent variable z_(a)′ representing each attribute on thebasis of the extracted latent variable z_(a) representing each attributeand the attribute vector y predicted by the prediction unit 50. At thistime, the latent variable z_(a)′ representing each attribute is obtainedby converting the latent variable z_(a) representing each attributeextracted by the variable extraction unit 52 using the value of theattribute vector y predicted by the prediction unit 50. As an example ofconversion, it is conceivable that the latent variable z_(a)representing each attribute is multiplied by the attribute vector y.Because each attribute is represented by a plurality of latent variablesz_(a) for each of the attributes, all of a plurality of latent variablescorresponding to each attribute are multiplied by the elements of theattribute vector y.

The signal output unit 53 causes the output unit 90 to display the inputimage data x of the change target in an image display region 98A of anattribute change screen 92, as shown in FIG. 1.

The signal attribute value display unit 54 causes the output unit 90 todisplay a latent variable z_(a)′ representing each attribute of theimage data x of the change target in a state in which an instruction forchanging the value can be received. Specifically, as shown in FIG. 1,the value of the latent variable z_(a)′ is displayed in the attributechange screen 92 by means of any controller such as a radio button 94 ora slide bar 96 indicating the value of the latent variable z_(a)′representing each attribute in a state in which the instruction forchanging the value of the latent variable z_(a)′ representing eachattribute can be received.

Moreover, reference image data that are transfer sources of theattributes are displayed in reference image display regions 98B of theattribute change screen 92 and radio buttons 94 for selecting thereference image data are displayed so as to correspond to the referenceimage display regions 98B. It is to be noted that a method for acquiringreference image data is not particularly limited. For example, referenceimage data may be acquired as needed from an external network or thelike via the input unit 10, or reference image data may be acquired inadvance via the input unit 10 and the acquired reference image data maybe stored in a storage unit (not shown).

The changed attribute value acquisition unit 56 acquires a changed valueof the latent variable z_(a)′ representing the attribute of the changetarget when the instruction for changing the value of the latentvariable z_(a)′ representing the attribute of the change target (e.g.,an operation on a radio button 94 or a slide bar 96 indicating the valueof the latent variable z_(a)′ representing the attribute) is received inthe attribute change screen 92.

When an operation on a radio button 94 has been received, the changedattribute value acquisition unit 56 acquires the changed value of thelatent variable z_(a)′ representing the attribute of the change targetdetermined in advance for the operated radio button 94.

Moreover, when an operation on a slide bar 96 corresponding to a latentvariable related to the presence or absence of an attribute such as anattribute “bangs” or an attribute “makeup” has been received, thechanged attribute value acquisition unit 56 acquires the changed valueof the latent variable z_(a)′ representing the attribute of the changetarget determined in advance for the position of the slide bar 96 thathas been operated.

Moreover, when an operation on a slide bar 96 corresponding to a latentvariable related to positive or negative of an attribute such as anattribute “male/female” or an attribute “not smile/smile” has beenreceived, the changed attribute value acquisition unit 56 acquires thechanged value of the latent variable z_(a)′ representing the attributeof the change target determined in advance for the position of the slidebar 96 that has been operated.

Moreover, when an operation on a radio button 94 for selecting referenceimage data has been received in the attribute change screen 92, theprediction unit 58 first inputs the selected image data x of a referencetarget to a pre-learned neural network (CNN) (not shown) for predictingthe attribute vector y and predicts the attribute vector y. Moreover,the variable extraction unit 60 receives the selected image data x ofthe reference target and extracts the latent variable z_(i) representingidentity and the latent variable z_(a) representing each attribute ofthe image data x of the reference target using the neural network Eserving as the encoder 1 stored in the neural network storage unit 40.Then, the variable extraction unit 60 obtains the latent variable z_(a)′representing each attribute of the image data x of the reference targetby converting the extracted latent variable z_(a) representing eachattribute using the value of the attribute vector y predicted by theprediction unit 58. The changed attribute value acquisition unit 56acquires the latent variable z_(a)′ representing each attribute of theimage data x of the reference target as a changed value of the latentvariable z_(a)′ representing the attribute of the change target.

The change unit 62 changes the value of the latent variable z_(a)′representing the attribute of the change target by replacing the valueof the latent variable z_(a)′ representing the attribute of the changetarget acquired by the changed attribute value acquisition unit 56 amonglatent variables z_(a)′ representing each attribute obtained by thevariable extraction unit 52 with the changed value.

The signal generation unit 64 receives the latent variable z_(i)representing identity extracted by the variable extraction unit 52 andthe latent variable z_(a)′ representing each attribute after the changeby the change unit 62 and generates image data using the neural networkG serving as the generator 2 stored in the neural network storage unit40.

The changed signal output unit 66 causes the output unit 90 to displaythe image data generated by the signal generation unit 64 in the imagedisplay region 98A of the attribute change screen 92, as shown in FIG.1.

<Operation of Signal Change Apparatus According to First Embodiment ofPresent Invention>

Next, an operation of the signal change apparatus 100 according to thefirst embodiment of the present invention will be described. The signalchange apparatus 100 executes a learning process routine and ageneration process routine to be described below.

First, the learning process routine will be described. When the inputunit 10 receives a plurality of pairs of image data x and an attributevector y as learning data, the signal change apparatus 100 executes thelearning process routine shown in FIG. 6.

First, in step S100, the learning unit 30 acquires any one of aplurality of pieces of learning data received by the input unit 10.

Next, in step S102, the learning unit 30 generates a latent variablez_(i) representing identity and a latent variable z_(a) representingeach attribute from a given data distribution.

In step S104, the learning unit 30 obtains a latent variable z_(a)′representing each attribute by converting the latent variable z_(a)representing each attribute generated in step S102 using the value ofthe attribute vector y acquired in step S100.

Then, in step S106, the learning unit 30 receives the latent variablez_(i) representing identity and the latent variable z_(a)′ representingeach attribute obtained by the conversion, which have been obtained inthe steps S102 and S104, respectively, and generates image data x′ usingthe neural network G serving as the generator 2.

In step S108, the learning unit 30 updates a parameter of the neuralnetwork G serving as the generator 2 and a parameter of the neuralnetwork D serving as the discriminator 3 on the basis of the latentvariable z_(i) representing identity generated in the step S102, thelatent variable z_(a)′ representing each attribute, the image data x′generated in the step S106, and the image data x and the attributevector y included in the learning data obtained in step S100.

In step S110, the learning unit 30 determines whether or not theprocessing of the steps S100 to S108 has been executed on all the piecesof the learning data. When there is learning data on which theprocessing of the steps S100 to S108 has not been executed, the learningunit 30 returns the processing to the step S100 and acquires thelearning data. In contrast, when the processing of the steps S100 toS108 has been executed on all the pieces of the learning data, thelearning unit 30 stores the parameter of the neural network G serving asthe generator 2 and the parameter of the neural network D serving as thediscriminator 3 that have been finally obtained in the neural networkstorage unit 40.

Next, in step S112, the learning unit 30 acquires any one of theplurality of pieces of learning data received by the input unit 10.

In step S114, the learning unit 30 receives image data x and anattribute vector y included in the learning data obtained in step S112and extracts the latent variable z_(i) representing identity and thelatent variable z_(a) representing each attribute using the neuralnetwork E serving as the encoder 1. Moreover, the learning unit 30receives the extracted latent variable z_(i) representing identity andthe extracted latent variable z_(a)′ representing each attribute andgenerates image data using the neural network G serving as the generator2. At this time, the latent variable z_(a)′ representing each attributeis obtained by converting the extracted latent variable z_(a)representing each attribute using the value of the attribute vector y ofthe image data.

In step S116, the learning unit 30 updates a parameter of the neuralnetwork E serving as the encoder 1 on the basis of the generated imagedata and the image data x included in the learning data obtained in stepS112.

In step S118, the learning unit 30 determines whether or not theprocessing of the steps S112 to S116 has been executed on all the piecesof the learning data. When there is learning data on which theprocessing of the steps S112 to S116 has not been executed, the learningunit 30 returns the processing to the step S112 and acquires thelearning data. In contrast, when the processing of the steps S112 toS116 has been executed on all the pieces of the learning data, thelearning unit 30 completes the learning process routine and stores thefinally obtained parameter of the neural network E serving as theencoder 1 in the neural network storage unit 40.

Next, the generation process routine will be described. When the inputunit 10 receives image data of the change target, the signal changeapparatus 100 executes the generation process routine shown in FIG. 7.

In step S150, the signal output unit 53 causes the output unit 90 todisplay the input image data of the change target in the image displayregion 98A of the attribute change screen 92, as shown in FIG. 1.

In step S152, the prediction unit 50 predicts the attribute vector yusing the pre-learned neural network serving as the predictor on thebasis of the image data of the change target received by the input unit10.

In step S154, the variable extraction unit 52 receives the image data ofthe change target received by the input unit 10 and extracts a latentvariable z_(i) representing identity and a latent variable z_(a)representing each attribute using the neural network E serving as theencoder 1 stored in the neural network storage unit 40. Moreover, thevariable extraction unit 52 obtains a latent variable z_(a)′representing each attribute on the basis of the extracted latentvariable z_(a) representing each attribute and the attribute vector ypredicted in step S152. At this time, the latent variable z_(a)′representing each attribute is obtained by converting the extractedlatent variable z_(a) representing each attribute using the value of thepredicted attribute vector y.

In step S156, the signal attribute value display unit 54 causes theoutput unit 90 to display the latent variable z_(a)′ representing eachattribute of the image data x of the change target obtained in the stepS154 in a state in which an instruction for changing the value can bereceived. Specifically, as shown in FIG. 1, the signal attribute valuedisplay unit 54 displays the value of the latent variable z_(a)′ in theattribute change screen 92 by means of a radio button 94 or a slide bar96 indicating the value of the latent variable z_(a)′ representing eachattribute in a state in which the instruction for changing the value ofthe latent variable z_(a)′ representing each attribute can be received.Moreover, the signal attribute value display unit 54 displays referenceimage data that are prepared in advance and that serve as transfersources of the attributes in the reference image display regions 98B ofthe attribute change screen 92.

In step S158, the changed attribute value acquisition unit 56 acquires achanged value of the latent variable z_(a)′ representing the attributeof the change target when the instruction for changing the value of thelatent variable z_(a)′ representing the attribute of the change target(e.g., an operation on the radio button 94 or the slide bar 96indicating the value of the latent variable z_(a)′ representing theattribute) is received in the attribute change screen 92.

Moreover, when an operation on a radio button 94 for selecting referenceimage data has been received in the attribute change screen 92, theprediction unit 58 inputs the selected image data x of the referencetarget to a pre-learned neural network for predicting the attributevector y and predicts the attribute vector y. Moreover, the variableextraction unit 60 receives the selected image data x of the referencetarget and extracts the latent variable z_(i) representing identity andthe latent variable z_(a) representing each attribute of the image datax of the reference target using the neural network E serving as theencoder 1 stored in the neural network storage unit 40. Then, thevariable extraction unit 60 obtains the latent variable z_(a)′representing each attribute of the image data x of the reference targetby converting the extracted latent variable z_(a) representing eachattribute using the value of the predicted attribute vector y. Thechanged attribute value acquisition unit 56 acquires the latent variablez_(a)′ representing each attribute of the image data x of the referencetarget as the changed value of the latent variable z_(a)′ representingthe attribute of the change target.

In step S160, the change unit 62 changes the value of the latentvariable z_(a)′ representing the attribute of the change target byreplacing the value of the latent variable z_(a)′ representing theattribute of the change target acquired in the step S158 among latentvariables z_(a)′ representing each attribute obtained in the step S154with the changed value.

Then, in step S162, the signal generation unit 64 receives the latentvariable z_(i) representing identity extracted in the step S154 and thelatent variable z_(a)′ representing each attribute for which the changeprocess has been performed in step S160 and generates image data usingthe neural network G serving as the generator 2 stored in the neuralnetwork storage unit 40.

Then, in step S164, the changed signal output unit 66 causes the outputunit 90 to display the generated image data in the image display region98A of the attribute change screen 92, as shown in FIG. 1, and completesthe generation process routine.

As described above, the signal change apparatus according to the firstembodiment of the present invention displays the value of a latentvariable representing an attribute in a state in which an instructionfor changing the value of a latent variable representing each attributeextracted with respect to image data of a change target can be receivedand outputs changed image data in which the attribute has been changedon the basis of the changed value of the latent variable representingthe attribute when the instruction for changing the value of the latentvariable representing the attribute is received. Thereby, it is possibleto appropriately change the image data.

Moreover, the signal change apparatus according to the first embodimentof the present invention extracts a latent variable of the input imagedata using the neural network E serving as the encoder 1, changes thevalue of the extracted latent variable, and generates image data usingthe neural network G serving as the generator 2 that takes the changedlatent variable as input. Thereby, it is possible to appropriatelychange image data.

Moreover, the signal change apparatus according to the first embodimentof the present invention can represent the diversity of an attributebecause there are a plurality of latent variables for each attribute.Moreover, the signal change apparatus according to the first embodimentof the present invention can control only the value of one of theplurality of latent variables for one attribute. For example, when onlyan attribute (e.g., glasses) is changed, it is only necessary tointeractively change each dimension of a multi-dimensional latentvariable z_(a) while fixing the latent variable z_(i) representingidentity. When only identity is changed while the attributes aremaintained, it is only necessary to change the latent variable z_(i)representing identity while fixing the latent variables z_(a)representing each attribute.

Moreover, the signal changing apparatus according to the firstembodiment of the present invention generates a latent variablerepresenting identity and a latent variable representing each attributein image data and learns the neural network G serving as the generator 2and the neural network D serving as the discriminator 3 fordiscriminating whether or not the generated image data follows the samedistribution as true image data under the attribute vector in accordancewith optimization conditions that contend with each other on the basisof the input true image data, the input attribute vector representingeach attribute in image data intended to be generated, the generatedlatent variable representing identity, and the generated latent variablerepresenting each attribute. Thereby, it is possible to learn the neuralnetwork G serving as the generator 2 capable of generating image dataand the neural network D serving as the discriminator 3 whilecontrolling attributes.

It is to be noted that the above-described embodiment describes anexample in which the neural network G serving as the generator 2 and theneural network D serving as the discriminator 3 are learned inaccordance with optimization conditions that contend with each other.However, the constraint is not limited thereto. For example, aconstraint may be further provided so that each latent variablerepresents an independent one. Specifically, as shown in the followingformula, a constraint is further provided so that a correlation(information amount) between the latent variable z_(a)′ and image datagenerated from the latent variable z_(a)′ becomes large.

$\begin{matrix}{\mspace{20mu}\lbrack {{Expression}\mspace{14mu} 10} \rbrack} & \; \\{{I( {z_{a}^{\prime}; {G( {z_{i},z_{a},y} )} \middle| y } )} = {{H( { z_{a}^{\prime} \middle| {G( {z_{i},z_{a},y} )} ,y} )} = {{{H( z_{a}^{\prime} \middle| y )} + {{\mathbb{E}}_{x\sim{G{({z_{i},z_{a},y})}}}\lbrack {{\mathbb{E}}_{{\hat{z}}_{a}^{\prime}\sim{P{({{z_{a}^{\prime}|x},y})}}}\lbrack {\log\;{P( { {\hat{z}}_{a}^{\prime} \middle| x ,y} )}} \rbrack} \rbrack}} = {{H( z_{a}^{\prime} \middle| y )} + {{\mathbb{E}}_{x\sim{G{({z_{i},z_{a},y})}}}{\quad\lbrack {{{D_{KL}( {{{P( {{\cdot | x },y} )}\text{||}Q( {{\cdot | x },y} )} + {{\mathbb{E}}_{{\hat{z}}_{a}^{\prime}\sim{P{({{z_{a}^{\prime}|x},y})}}}\lbrack {\log\;{Q( { {\hat{z}}_{a}^{\prime} \middle| x ,y} )}} \rbrack}} \rbrack} \geqq {{H( z_{a}^{\prime} \middle| y )} + {{\mathbb{E}}_{x\sim{G{({z_{i},z_{a},y})}}}\lbrack {{\mathbb{E}}_{{\hat{z}}_{a}^{\prime}\sim{P{({z_{a}^{\prime}|{xy}})}}}\lbrack {\log\;{Q( { {\hat{z}}_{a}^{\prime} \middle| x ,y} )}} \rbrack} \rbrack}}} = {{H( z_{a}^{\prime} \middle| y )} + {{{\mathbb{E}}_{{z_{a}^{\prime}\sim{P{({z_{a}^{\prime}|y})}}},{x\sim{G{({z_{i},z_{a}^{\prime}})}}}}\lbrack {\log\;{Q( { z_{a}^{\prime} \middle| x ,y} )}} \rbrack}.}}} }}}}}} & (6)\end{matrix}$

It is to be noted that I(z_(a)′; G(z_(i), z_(a), y)|y) represents theamount of mutual information between the latent variable z_(a)′ and theimage data G(z_(i), z_(a), y) when the attribute vector y is given. Hrepresents conditional entropy. D_(KL) represents Kullback-Leiblerdivergence. P(z_(a)′|x, y) represents a distribution of latent variablesz_(a)′ when the image data x and the attribute vector y are given.z_(a){circumflex over ( )}′ ({circumflex over ( )} is attached abovez_(a)) is a latent variable obtained in accordance with the distributionP(z_(a)′|x, y).

Because P(z_(a)′|x, y) is unknown, it is difficult to directly obtainthe amount of information I. Thus, as described above, an approximationdistribution Q(z_(a)′|x, y) for approximating P(z_(a)′|x, y) isintroduced and a neural network for estimating the approximationdistribution Q(z_(a)′|x, y) is learned and optimization conditions thatcontend with each other are optimized so that a lower limit of theamount of information I is maximized using calculus of variations.Thereby, when a plurality of latent variables for the attribute“glasses” include a latent variable z_(a) ¹ and a latent variable z_(a)² and the latent variable z_(a) ¹ related to the attribute “glasses”represents sunglasses, the latent variable z_(a) ² represents glassesother than sunglasses.

Moreover, the first embodiment describes an example in which the neuralnetwork E serving as the encoder 1 simultaneously estimates the latentvariable z_(a) representing the attribute and the latent variable z_(i)representing identity. However, a method for estimating the latentvariables is not limited thereto. For example, the neural network Eserving as the encoder 1 may simultaneously estimate the latent variablez_(a)′ representing the attribute and the latent variable z_(i)representing identity by directly estimating the latent variable z_(a)′representing the attribute instead of the latent variable z_(a)representing the attribute.

Moreover, if the neural network for estimating the approximationdistribution Q(z_(a)′|x, y) is learned together when the neural networkG serving as the generator 2 is learned, the latent variable z_(a)′representing the attribute may be estimated using the neural network forestimating the approximation distribution and the neural network Eserving as the encoder 1 may estimate only the latent variable z_(i)representing identity.

Moreover, the optimum latent variable z_(i) representing identity may beobtained by inputting any latent variable z_(i) representing identity tothe neural network G serving as the generator 2 without using the neuralnetwork E serving as the encoder 1 and updating the latent variablez_(i) representing identity using a gradient method so that the outputof the neural network G serving as the generator 2 is close to thetarget image x. Moreover, the optimum latent variable z_(i) representingidentity may be obtained by obtaining a latent variable z_(a)′representing an attribute and a latent variable z_(i) representingidentity using the neural network E serving as the encoder 1, settingthe latent variable z_(a)′ and the latent variable z_(i) as initialvalues, inputting the latent variable z_(i) representing identity to theneural network G serving as the generator 2, and updating the latentvariable z_(i) representing identity using a gradient method so that theoutput of the generator 2 is close to the target image x. Moreover, theneural network E serving as the encoder 1 or the neural network servingas the predictor may be learned together with the neural network Gserving as the generator 2 and the neural network D serving as thediscriminator 3.

Next, a signal change apparatus according to a second embodiment of thepresent invention will be described. It is to be noted that becausecomponents of the signal change apparatus according to the secondembodiment are similar to those of the signal change apparatus accordingto the first embodiment, the components are denoted by the samereference signs and a description thereof will be omitted.

The second embodiment is different from the first embodiment in thatlatent variables representing each attribute are hierarchicallystructured.

Overview of Second Embodiment of Present Invention

Next, an overview of the second embodiment of the present invention willbe described.

In order to achieve hierarchical control of attributes, the secondembodiment has a structure in which a latent variable representing eachattribute is hierarchically converted into latent variables of two ormore layers as shown in FIG. 8. Moreover, a latent variable c₁ of afirst layer represents each attribute and corresponds to the attributevector y in the first embodiment. The latent variable represents, forexample, the presence or absence of an attribute or positive or negativeof an attribute. However, latent variables are not particularly limitedthereto.

A latent variable c₂ of a second layer is converted using the value ofthe latent variable c₁ of the first layer and a conversion result c₂′ isobtained. Moreover, a latent variable c₃ of a third layer is convertedusing the value of the conversion result c₂′ for the latent variable c₂of the second layer and a conversion result c₃′ is obtained. Then, thesignal change apparatus 100 receives a latent variable z₃ representingidentity and the conversion result c₃′ and generates image data using aneural network G₃ serving as the generator.

Moreover, in learning of the neural networks, as shown in FIG. 9, thesignal change apparatus 100 receives the latent variable c₁ and a latentvariable z_(i) representing identity of the first layer, and learns aneural network G₁ for generating image data, a neural network D₁ servingas a discriminator, and a neural network Q₁ serving as an approximationdistribution. Moreover, the signal change apparatus 100 receives theconversion result c₂′ for the latent variable of the second layer and alatent variable z₂ representing identity, and learns a neural network G₂for generating image data, a neural network D₂ serving as adiscriminator, and a neural network Q₂ serving as an approximationdistribution. Furthermore, the signal change apparatus 100 receives theconversion result c₃′ for the latent variable of the third layer and alatent variable z₃ representing identity, and learns the neural networkG₃ for generating image data, a neural network D₃ serving as adiscriminator, and a neural network Q₃ serving as an approximationdistribution. It is to be noted that in FIG. 9, P₁, P₂, and P₃ arediscrimination results by the neural networks D₁, D₂, and D₃ serving asthe discriminators, respectively. Moreover, c₁, c₂′, and c₃′respectively obtained in the first to third layers are latent variablesrepresenting attributes predicted by the neural networks Q₁, Q₂, and Q₃serving as the approximation distributions.

In this manner, the signal change apparatus 100 learns the neuralnetwork serving as the generator, the neural network serving as thediscriminator, and the neural network serving as the approximationdistribution on a layer-by-layer basis by initially learning the neuralnetworks corresponding to the latent variable of the first layer andrecursively learning the neural networks corresponding to a latentvariable of a deeper layer by one layer on the basis of a learningresult. Thereby, an abstract concept is first acquired in a shallowlayer and the concept can be gradually detailed as the layer becomesdeeper.

<Configuration of Signal Change Apparatus According to Second Embodimentof Present Invention>

In the signal change apparatus 100 according to the second embodiment ofthe present invention, an input unit 10 receives a plurality of piecesof image data x as learning data. Moreover, the input unit 10 receivesimage data x that is a change target.

First, a learning unit 30 generates latent variables z₁, z₂, and z₃representing identity and latent variables c₁, c₂, and c₃ representingeach attribute in layers from given data distributions. Each latentvariable represents, for example, the presence or absence or positive ornegative of an attribute in each layer. However, latent variables arenot particularly limited thereto. The learning unit 30 may generate thelatent variables z₁, z₂, and z₃ representing identity and the latentvariables c₁, c₂, and c₃ representing each attribute in respectivelayers using random numbers. The learning unit 30 receives true imagedata x included in the input learning data, the generated latentvariables z₁, z₂, and z₃ representing identity, and the generated latentvariables c₁, c₂, and c₃ representing each attribute in the respectivelayers. Then, the learning unit 30 learns a neural network (e.g., a CNN)serving as a generator for generating image data and a neural network(e.g., a CNN) serving as a discriminator for discriminating whether ornot the generated image data follows the same distribution as the trueimage data from the latent variables z₁, z₂, and z₃ representingidentity and the latent variables c₁, c₂, and c₃ representing eachattribute in accordance with optimization conditions that contend witheach other. At the same time, the learning unit 30 performs learning sothat a lower limit of an amount of information is maximized with respectto a neural network (e.g., a CNN) serving as an approximationdistribution for estimating a latent variable representing eachattribute with respect to the generated image data. The learning unit 30iteratively performs the above-described process recursively withrespect to each layer.

Specifically, the learning unit 30 first receives the true image data xincluded in the learning data input with respect to the first layer, thegenerated latent variable z_(i) representing identity, and the latentvariable c₁ representing each attribute of the first layer.

Then, the learning unit 30 generates image data using the generatedlatent variable z_(i) representing identity, the latent variable c₁representing each attribute of the first layer, and the neural networkG₁ serving as a generator.

Then, the learning unit 30 updates a parameter of the neural network G₁serving as the generator so as to satisfy a constraint that thediscriminator discriminates that the generated image data follows thesame distribution as the true image data as much as possible. That is,the parameter of the neural network G₁ is updated so that thediscriminator discriminates that the generated image data is the trueimage data x.

Moreover, the learning unit 30 updates a parameter of the neural networkD₁ serving as the discriminator so as to satisfy a constraint that thediscriminator discriminates that the generated image data does notfollow the same distribution as the true image data x as much aspossible and so as to satisfy a constraint that the discriminatordiscriminates that the true image data x follows the same distributionas the true image data.

Moreover, the learning unit 30 updates a parameter of the neural networkQ₁ serving as the approximation distribution so that a lower limit of acorrelation (information amount) between the latent variable c₁ andimage data generated from the latent variable c₁ is maximized withrespect to the neural network Q₁ serving as the approximationdistribution for predicting the latent variable c₁ representing eachattribute of the first layer with respect to the generated image data.

Next, the learning unit 30 receives the true image data x included inthe learning data input with respect to the second layer, the latentvariable c₁ representing each attribute of the first layer predicted bythe neural network Q₁ serving as the approximation distribution, thegenerated latent variable z₂ representing identity, and the latentvariable c₂ representing each attribute of the second layer. At thistime, the latent variable c₂′ representing each attribute of the secondlayer is obtained by converting the latent variable c₂ representing eachattribute of the second layer using the value of the latent variable c₁representing each attribute of the first layer. As an example ofconversion, it is conceivable that the latent variable c₂ representingeach attribute of the second layer is multiplied by the latent variablec₁ representing each attribute of the first layer.

Moreover, the learning unit 30 generates image data using the generatedlatent variable z₂ representing identity, the conversion result c₂′ forthe latent variable c₂ representing each attribute of the second layer,and the neural network G₂ serving as the generator.

Then, the learning unit 30 updates a parameter of the neural network G₂serving as the generator so as to satisfy a constraint that thediscriminator discriminates that the generated image data follows thesame distribution as the true image data under the latent variable c₁representing each attribute of the first layer as much as possible. Thatis, the parameter of the neural network G₂ is updated so that thediscriminator discriminates that the generated image data is the trueimage data.

Moreover, the learning unit 30 updates a parameter of the neural networkD₂ serving as the discriminator so as to satisfy a constraint that thediscriminator discriminates that the generated image data does notfollow the same distribution as the true image data under the latentvariable c₁ representing each attribute of the first layer as much aspossible and so as to satisfy a constraint that the discriminatordiscriminates that the true image data x follows the same distributionas the true image data.

Moreover, the learning unit 30 updates a parameter of the neural networkQ₂ serving as the approximation distribution so that a lower limit of acorrelation (information amount) between the latent variable c₂′ andimage data generated from the latent variable c₂′ is maximized withrespect to the neural network Q₂ serving as the approximationdistribution for predicting the latent variable c₂′ representing eachattribute of the second layer with respect to the generated image dataunder the latent variable c₁ representing each attribute of the firstlayer.

Next, the learning unit 30 receives the true image data x included inthe learning data input with respect to the third layer, the latentvariable c₂′ representing each attribute of the second layer predictedby the neural network Q₂ serving as the approximation distribution, thegenerated latent variable z₃ representing identity, and the latentvariable c₃ representing each attribute of the third layer. At thistime, the latent variable c₃′ representing each attribute of the thirdlayer is obtained by converting the latent variable c₃ representing eachattribute of the third layer using the value of the conversion resultc₂′ for the latent variable c₂ representing each attribute of the secondlayer.

Moreover, the learning unit 30 generates image data using the generatedlatent variable z₃ representing identity, the conversion result c₃′ forthe latent variable c₃ representing each attribute of the third layer,and the neural network G₃ serving as the generator.

Then, the learning unit 30 updates a parameter of the neural network G₃serving as the generator so as to satisfy a constraint that thediscriminator discriminates that the generated image data follows thesame distribution as the true image data under the conversion result c₂′for the latent variable c₂ representing each attribute of the secondlayer as much as possible. That is, the parameter of the neural networkG₃ is updated so that the discriminator discriminates that the generatedimage data is the true image data.

Moreover, the learning unit 30 updates a parameter of the neural networkD₃ serving as the discriminator so as to satisfy a constraint that thediscriminator discriminates that the generated image data does notfollow the same distribution as the true image data under the conversionresult c₂′ for the latent variable c₂ representing each attribute of thesecond layer as much as possible and so as to satisfy a constraint thatthe discriminator discriminates that the true image data x follows thesame distribution as the true image data.

Moreover the learning unit 30 updates a parameter of the neural networkQ₃ serving as the approximation distribution so that a lower limit of acorrelation (information amount) between the latent variable c₃′ andimage data generated from the latent variable c₃′ is maximized withrespect to the neural network Q₃ serving as the approximationdistribution for predicting the latent variable c₃′ representing eachattribute of the third layer with respect to the generated image dataunder the conversion result c₂′ for the latent variable c₂ representingeach attribute of the second layer.

The learning unit 30 performs the above-described process for each pieceof the learning data and iteratively updates the parameters of varioustypes of neural networks.

The neural networks G₁, G₂, and G₃ serving as the generators, the neuralnetworks D₁, D₂, and D₃ serving as the discriminators, and the neuralnetworks Q₁, Q₂, and Q₃ serving as the approximation distributions whichare finally obtained are stored in the neural network storage unit 40.

Next, the learning unit 30 receives image data x included in the inputlearning data, estimates the latent variables z₁, z₂, and z₃representing identity using a neural network serving as an encoder, andextracts the latent variables c₁, c₂′, and c₃′ representing eachattribute using the neural networks Q₁, Q₂, and Q₃ serving as theapproximation distributions.

Moreover, the learning unit 30 receives the estimated latent variable z₃representing identity and the extracted latent variable c₃′ representingeach attribute, and generates image data using the neural network G₃serving as the generator.

Moreover, the learning unit 30 updates a parameter of the neural networkserving as an encoder so as to satisfy a constraint that the generatedimage data is the same as the original image data x.

The learning unit 30 performs the above-described process for each pieceof the learning data and iteratively updates the parameter of the neuralnetwork serving as the encoder.

The neural network serving as the encoder that is finally obtained isstored in the neural network storage unit 40.

The prediction unit 50 inputs the image data of the change targetreceived by the input unit 10 to the neural network Q₁ serving as theapproximation distribution for predicting the latent variable c₁representing each attribute of the first layer and predicts the latentvariable c₁ representing each attribute of the first layer.

The prediction unit 50 outputs the latent variable c₁ representing eachattribute of the first layer to the variable extraction unit 52.

The variable extraction unit 52 receives the input image data x of thechange target, estimates latent variables z₁, z₂, and z₃ representingidentity of the image data x of the change target using the neuralnetwork serving as an encoder stored in the neural network storage unit40, and extracts latent variables c₂′ and c₃′ representing the attributeof the second and subsequent layers using the neural networks Q₂ and Q₃serving as the approximation distributions. The variable extraction unit52 outputs the latent variable c₁ predicted by the prediction unit 50and the extracted latent variables c₂′ and c₃′ to the signal attributevalue display unit 54.

The signal attribute value display unit 54 causes the output unit 90 todisplay the latent variables c₁, c₂′, and c₃′ representing eachattribute of the image data x of the change target in a state in whichan instruction for changing the values can be received. Specifically,the signal attribute value display unit 54 displays the values of thelatent variables c₁, c₂′, and c₃′ in the attribute change screen 92 bymeans of radio buttons 94 or slide bars 96 indicating the values of thelatent variables c₁, c₂′, and c₃′ representing each attribute in a statein which an instruction for changing the values of the latent variablesc₁, c₂′, and c₃′ representing each attribute can be received.

Moreover, reference image data that are transfer sources of theattributes are displayed in the reference image display regions 98B ofthe attribute change screen 92 and radio buttons 94 for selecting thereference image data are displayed so as to correspond to the referenceimage display regions 98B.

The changed attribute value acquisition unit 56 acquires changed valuesof the latent variables c₁, c₂′, and c₃′ representing the attribute ofthe change target when the instruction for changing the values of thelatent variables c₁, c₂′, and c₃′ representing the attribute of thechange target (e.g., an operation on the radio button 94 or the slidebar 96 indicating the value of the latent variable c₃′ representing eachattribute) is received in the attribute change screen 92. It is to benoted that when an instruction for changing the value of the latentvariable c₁ representing an attribute is received, the values of theassociated latent variables c₂′ and c₃′ representing the attribute arealso changed. Moreover, when an instruction for changing the value ofthe latent variable c₂′ representing the attribute is received, thevalue of the associated latent variable c₃′ representing the attributeis also changed.

Moreover, when an operation on a radio button 94 for selecting referenceimage data is received in the attribute change screen 92, the predictionunit 58 first inputs the selected image data of the reference target tothe neural network Q₁ serving as an approximation distribution forpredicting the latent variable c₁ representing each attribute of thefirst layer and predicts the latent variable c₁ representing eachattribute of the first layer. Moreover, the variable extraction unit 60receives the selected image data x of the reference target, estimateslatent variables z₁, z₂, and z₃ representing identity of the image datax of the reference target using the neural network serving as an encoderstored in the neural network storage unit 40, and extracts the latentvariables c₂′ and c₃′ representing each attribute using the neuralnetworks Q₂ and Q₃ serving as approximation distributions. Then, thechanged attribute value acquisition unit 56 acquires the obtained latentvariable c₃′ representing each attribute as the changed value of thelatent variable c₃′ representing the attribute of the change target.

The change unit 62 changes the latent variable c₃′ representing theattribute of the change target by replacing the latent variable c₃′representing the attribute of the change target acquired by the changedattribute value acquisition unit 56 among the latent variables c₁, c₂′,and c₃′ representing each attribute obtained by the prediction unit 50and the variable extraction unit 52 with the changed value.

The signal generation unit 64 receives the latent variable z₃representing identity estimated by the variable extraction unit 52 andthe latent variable c₃′ representing each attribute after the change bythe change unit 62 and generates image data using the neural network G₃serving as the generator stored in the neural network storage unit 40.

It is to be noted that because the other configuration and operation ofthe signal change apparatus 100 according to the second embodiment aresimilar to those of the first embodiment, a description thereof will beomitted.

As described above, the signal change apparatus according to the secondembodiment changes the value of the conversion result of the latentvariable representing each attribute of the third layer in accordancewith the change instruction, receives a latent variable representingidentity extracted by the neural network serving as the encoder and theconversion result of the latent variable representing each attribute ofthe third layer after the change, and generates image data using theneural network as the generator. Thereby, it is possible toappropriately change image data.

It is to be noted that the present invention is not limited to theabove-described embodiments and various modifications and applicationscan be made without departing from the gist of the present invention.

For example, in the above-described embodiments, the signal changeapparatuses 100 are configured so as to include the learning unit 30 forperforming learning of the neural networks. However, the presentinvention is not limited to such a configuration. For example, alearning apparatus including the learning unit 30 may be providedseparately from the signal change apparatus and the signal changeapparatus may be configured to include a prediction unit, a variableextraction unit, a signal output unit, a signal attribute value displayunit, a changed attribute value acquisition unit, a change unit, asignal generation unit, and a changed signal output unit.

Moreover, the above-described embodiments describe examples in which theinput signal is face image data. However, the input signal is notlimited thereto and may be image data other than the face image data.For example, the input signal may be character image data. In this case,an attribute vector y represents the presence or absence of each oftypes of characters (e.g., a, b, c, . . . 1, 2, . . . ), and a latentvariable z_(a) representing the attribute represents the diversitywithin a character (e.g., representing “What type of character is 4?”).Alternatively, the attribute vector y represents the presence or absenceof each of character fonts (e.g., a Gothic style, a Mincho style, a boldstyle, an italic style, and the like), and the latent variable z_(a)representing the attribute represents the diversity within a font (e.g.,representing “What Gothic style is it?”).

Moreover, the input signal may be animal image data (e.g., bird imagedata). In this case, the attribute vector y represents the presence orabsence of a color (e.g., red), and the latent variable z_(a)representing the attribute represents the diversity within the color(e.g., representing “How red is the bird in what part?”). Alternatively,the attribute vector y represents the shape of a part (e.g., a bill isround/pointed) and the latent variable z_(a) representing the attributerepresents the diversity within the part (e.g., representing “How roundis the bill?”).

Moreover, the input signal may be background image data. In this case,the attribute vector y represents the type of background (e.g., a sea, amountain, a river, a house, or a road), and the latent variable z_(a)representing the attribute represents the diversity within thebackground (e.g., representing “What type of sea is it?”).

Moreover, the input signal may be house image data. In this case, theattribute vector y represents the presence or absence of a color (e.g.,red), and the latent variable z_(a) representing the attributerepresents the diversity within the color (e.g., representing “How redis the house in what part?”).

Moreover, the input signal may be structure image data. In this case,the attribute vector y represents the presence or absence of the type ofstructure (e.g., a building, a detached house, or a tower), and thelatent variable z_(a) representing the attribute represents thediversity within the structure (e.g., representing “What type ofbuilding is it?”). Alternatively, the attribute vector y represents thepresence or absence of the shape of a part (e.g., a roof is flat, a roofis round, or a roof is triangular) and the latent variable z_(a)representing the attribute represents the diversity within the part(e.g., representing “How flat is the roof?).

Moreover, the above-described embodiments describe examples in which theinput signal is image data. However, the input signal is not limitedthereto and may be a signal other than image data. For example, theinput signal may be an audio signal (or a music signal), text data, ormoving-image data.

When an audio signal is input, the signal change apparatus 100 canreconfigure the audio signal by changing a latent variable representingan attribute after extracting a latent variable representing identityand a latent variable representing an attribute (an attribute related toa person who is a generation source of the audio signal (e.g.,attractiveness, male/female, young/old, an emotion, a dialect, or thelike) or an attribute related to an element constituting the audiosignal (e.g., fast/slow or high/low)). In this case, as shown in FIG.10, the signal output unit 53 or the changed signal output unit 66causes an audio waveform and an audio spectrogram of audio data of thechange target or an audio waveform and an audio spectrogram of audiodata after the change to be displayed in audio display regions 298A and298B of an attribute change screen 292. Moreover, the signal attributevalue display unit 54 displays the value of a latent variable z_(a)′representing each attribute in the attribute change screen 292 by meansof a radio button 94 or a slide bar 96 indicating the value of thelatent variable z_(a)′ representing each attribute in a state in whichan instruction for changing the value of the latent variable z_(a)′representing each attribute can be received. Moreover, audio waveformsor spectrograms of reference audio signals that are transfer sources ofthe attributes are displayed in reference audio display regions 298C ofthe attribute change screen 292 and radio buttons 94 for selecting thereference audio signals are displayed so as to correspond to thereference audio display regions 298C. Moreover, radio buttons 294 forselecting attributes to be transferred from the reference audio signalsthat are transfer sources of the attributes are displayed on theattribute change screen 292. Moreover, buttons 299 for issuing aninstruction for reproducing an audio signal of a change target, an audiosignal after the change, or a reference audio signal are also displayedon the attribute change screen 292.

When text data is input, the signal change apparatus 100 can reconfigurethe text data by changing a latent variable representing an attributeafter extracting a latent variable representing identity and a latentvariable representing an attribute (an attribute related to a person whois a generation source of the text data (e.g., the degree of politeness,male/female, or the like) or an attribute related to an elementconstituting the text data (e.g., abstract/concrete, an emotion, agenre, colloquial/literary, or formal/not formal)). In this case, asshown in FIG. 11, the signal output unit 53 or the changed signal outputunit 66 causes text data of the change target or text data after thechange to be displayed in a text display region 398A of an attributechange screen 392. Moreover, the signal attribute value display unit 54displays the value of a latent variable z_(a)′ representing eachattribute in the attribute change screen 392 by means of a radio button94 or a slide bar 96 indicating the value of the latent variable z_(a)′representing each attribute in a state in which an instruction forchanging the value of the latent variable z_(a)′ representing eachattribute can be received. Moreover, reference text data that aretransfer sources of the attributes are displayed in reference textdisplay regions 398B of the attribute change screen 392 and radiobuttons 94 for selecting the reference text data are displayed so as tocorrespond to the reference text display regions 398B. Moreover, radiobuttons 294 for selecting attributes to be transferred from thereference text data that are the transfer sources of the attributes aredisplayed on the attribute change screen 392.

When moving-image data is input, the signal change apparatus 100 canreconfigure the moving-image data by changing a latent variablerepresenting an attribute after extracting a latent variablerepresenting identity and a latent variable representing an attribute(an attribute related to an element constituting the moving-image data(e.g., a comical degree, old/new, live-action/animation, an emotion, agenre, or the like)). In this case, as shown in FIG. 12, the signaloutput unit 53 or the changed signal output unit 66 causes moving-imagedata of the change target or moving-image data after the change to bedisplayed in a moving-image display region 498A of an attribute changescreen 492. Moreover, the signal attribute value display unit 54displays the value of a latent variable z_(a)′ representing eachattribute in the attribute change screen 492 by means of a radio button94 or a slide bar 96 indicating the value of the latent variable z_(a)′representing each attribute in a state in which an instruction forchanging the value of the latent variable z_(a)′ representing eachattribute can be received. Moreover, reference moving-image data thatare transfer sources of the attributes are displayed in referencemoving-image display regions 498B of the attribute change screen 492 andradio buttons 94 for selecting the reference moving-image data aredisplayed so as to correspond to the reference moving-image displayregions 498B. Moreover, radio buttons 294 for selecting attributes to betransferred from the reference moving-image data that are the transfersources of the attributes are displayed on the attribute change screen492. Buttons 499 for issuing an instruction for reproducing moving-imagedata of a change target, moving-image data after the change, orreference moving-image data are also displayed in the attribute changescreen 492.

It is to be noted that the above-described embodiments describe examplesin which a latent variable representing each attribute is changed.However, the change target is not limited thereto. For example, aninstruction for changing the value of a latent variable representingidentity may be received, the latent variables representing eachattribute may be maintained as it is, a latent variable representingidentity extracted from image data may be changed, and the image datamay be reconstructed.

Moreover, the reconstructed image data may be corrected in considerationof a reconstruction error when the image data is reconstructed. Forexample, a latent variable representing the attribute may be changed inaccordance with the following formula, and image data x_(mod) generatedusing the neural network serving as the generator may be corrected.[Expression 11]{tilde over (x)}=x _(rec) +MΔ+(1−M)Δ′Δ=x _(mod) −x _(rec),Δ′=x−x _(rec)  (7)

where x_(rec) is image data reconstructed without changing a latentvariable representing an attribute, x is image data of a change target,and M is a mask image obtained in advance in accordance with thefollowing formula.[Expression 12]M=min(α·g(|Δ|;σ),1)  (8)

Here, Here, g represents a Gaussian distribution, σ represents thevariance of the Gaussian distribution for|Δ|[Expression 13]

which is an absolute value of an average of Δ with respect to RGB (red,green, blue), and a represents the spread of the Gaussian distribution.

Although an example in which the Gaussian distribution is used tocalculate a mask image has been described, any distribution such as aLaplacian distribution may be used. Although an example in which theaverage value of the absolute values is used for calculating the norm ofthe difference image has been described, any norm such as an L2 norm maybe used. The mask image may be calculated for each luminance value.

Moreover, the above-described embodiments describe examples in whichCNNs are used as the neural networks serving as the discriminators, thegenerators, the predictors, and the approximation distributions.However, the structure of the neural networks is not limited thereto andother neural network structures may be used. For example, a recurrentneural network (RNN) (e.g., a long short-term memory (LSTM) or the like)which is a model that takes time series into consideration, afully-connected neural network, or the like may be used.

Moreover, the above-described embodiments describe examples in which thelatent variables themselves are output as the output of the neuralnetwork serving as the encoder. However, the output of the neuralnetwork serving as the encoder is not limited thereto. For example, theoutput of the neural network serving as the encoder may be a parameterrelated to the distribution of latent variables (e.g., an average and astandard deviation in the case of a Gaussian distribution) and thelatent variables may be obtained by performing sampling in accordancewith the parameter related to the distribution.

Moreover, the second embodiment describes an example in which the neuralnetwork serving as the encoder estimates the latent variables z₁, z₂ andz₃ representing identity, predicts the latent variables c₂′ and c₃′representing attributes using the neural networks Q₂ and Q₃ serving asthe approximation distributions, and predicts the latent variable c₁using the neural network serving as the predictor. However, the neuralnetwork serving as the encoder may simultaneously estimate the latentvariables c₁, c₂, and c₃ representing attributes and the latentvariables z₁, z₂, and z₃ representing identity. Alternatively, theneural network serving as the encoder may directly estimate the latentvariables c₂′ and c₃′ representing the attribute instead of the latentvariables c₂ and c₃ representing the attribute.

Moreover, the optimum latent variables z₁, z₂, and z₃ representingidentity may be obtained by inputting any latent variables z₁, z₂, andz₃ representing identity to the neural network serving as the generatorand updating the latent variables z₁, z₂, and z₃ representing identityusing a gradient method so that the output is close to a target image xwithout using the neural network serving as the encoder. Moreover, theoptimum latent variables z₁, z₂, and z₃ representing identity may beobtained by obtaining the latent variable c₁ representing the attributeusing the neural network serving as the predictor, obtaining the latentvariables c₂′ and c₃′ representing the attribute and the latentvariables z₁, z₂, and z₃ representing identity using the neural networkserving as the encoder, setting the obtained latent variables as initialvalues, inputting the latent variables z₁, z₂, and z₃ representingidentity to the neural network serving as the generator, and updatingthe latent variables z₁, z₂, and z₃ representing identity using agradient method so that the output is close to a target image x.

Moreover, when the latent variables c₂ and c₃ representing the attributeare estimated simultaneously with the latent variables z₁, z₂, and z₃representing identity using the neural network serving as the encoder,the neural network serving as the encoder obtains the latent variablesc₂′ and c₃′ representing the attribute on the basis of the estimatedlatent variables c₂ and c₃ representing each attribute and the predictedlatent variable c₁ representing each attribute of the first layer. Thelatent variable c₃′ representing each attribute is obtained as follows.

First, the latent variable c₂′ representing each attribute of the secondlayer is obtained by converting the latent variable c₂ representing eachattribute of the second layer using the value of the latent variable c₁representing each attribute of the first layer. Next, the latentvariable c₃′ representing each attribute of the third layer is obtainedby converting the latent variable c₃ representing each attribute of thethird layer using the value of the conversion result c₂′ of the latentvariable c₂ representing each attribute of the second layer.

Moreover, the neural network serving as the encoder or the neuralnetwork serving as the predictor may be learned together with the neuralnetwork serving as the generator and the neural network serving as thediscriminator.

Moreover, the second embodiment describes an example in which the latentvariable c₁ representing each attribute of the first layer is predictedfrom the image data included in the learning data and is used forlearning. However, a method for obtaining the latent variable c₁representing each attribute of the first layer is not limited thereto.For example, when learning is performed, a latent variable c₁representing each attribute of the first layer may be input as learningdata.

Moreover, in addition to the latent variable c₁ representing eachattribute of the first layer, the latent variable representing eachattribute of any layer may also be input as learning data to learn adeeper layer.

Moreover, the first embodiment describes an example in which theattribute vector y is assigned to all the pieces of the image data xincluded in the learning data. However, the image data x to which theattribute vector y is assigned is not limited thereto. For example, theattribute vector y may be assigned only to part of image data x includedin learning data. Alternatively, the learning data may not include theattribute vector y. In this case, the signal change apparatus mayestimate the attribute vector y as in the signal change apparatus of thesecond embodiment that estimates the latent variable c₁ representingeach attribute corresponding to the attribute vector y. The signalchange apparatus may learn each neural network on the basis of theestimated attribute vector y. Likewise, even in the case of the secondembodiment, latent variables may be assigned only to part of image datax included in the learning data when latent variables representing eachattribute of any layer are input as learning data.

A case in which the signal change apparatus of the second embodimentobtains a latent variable c_(i)′ by converting a latent variable c_(i)using the value of a latent variable c_(i-1)′ representing eachattribute of a layer directly before a current layer has been described.However, the signal change apparatus is not limited thereto and thelatent variable c_(i)′ may be obtained by converting the latent variablec_(i) using at least one of latent variables c_(j)′ (j=1, 2, . . . ,i−1) of a layer shallower than that corresponding to the latent variablec_(i)′. For example, when the latent variable c_(i)′ is obtained, thesignal change apparatus may obtain the latent variable c_(i)′ byconverting the latent variable c_(i) using a latent variable c_(i-2)′ ofa layer that is two layers shallower than that corresponding to thelatent variable c_(i)′. Furthermore, the signal change apparatus mayobtain the latent variable on the basis of a predetermined relationshipbetween the latent variable c_(j)′ (j=1, 2, . . . , i−1) of a layershallower than that corresponding to the latent variables c_(i)′ and thelatent variables c_(i)′.

In the generator 2 in the signal change apparatus of the firstembodiment, a process of converting the latent variable z_(a) using theattribute vector y may be performed by a neural network. The learningunit may perform learning of the neural network that converts the latentvariable z_(a) together with learning of the neural network G serving asthe generator. In the generator 2 in the signal change apparatus of thesecond embodiment, a process of obtaining the latent variable c_(i)′ byconverting the latent variable c_(i) using the latent variable c_(i-1)′may be performed by a neural network. The learning unit may performlearning of the neural network obtaining the latent variable c_(i)′together with learning of the neural network G_(i) serving as thegenerator.

A case in which the signal change apparatus in the first embodimentgenerates the latent variables z_(i) and z_(a) from a data distributionsuch as a categorical distribution or a uniform distribution has beendescribed. However, a method for generating the latent variables z_(i)and z_(a) is not limited thereto. For example, the signal changeapparatus may generate the latent variables z_(i) and z_(a) on the basisof any distribution including a Gaussian distribution, a Dirichletdistribution, or the like. Likewise, the signal change apparatus in thesecond embodiment may generate latent variables z₁, z₂, z₃, c₁, c₂, andc₃ on the basis of any distribution including a Gaussian distribution, aDirichlet distribution, or the like. Alternatively, the signal changeapparatuses in the first and second embodiments may be provided with aneural network for generating each of the latent variables.

A case in which the signal change apparatuses in the first and secondembodiments use an objective function shown in Formula (5) as anoptimization condition in learning of the neural network G serving asthe generator and the neural network D serving as the discriminator hasbeen described. However, the objective function representing theoptimization condition in learning of the neural network G serving asthe generator and the neural network D serving as the discriminator isnot limited thereto. For example, the signal change apparatus may useany extended model including a least squares GAN, a Wasserstein GAN, andthe like.

A case in which the neural network D serving as the discriminator in thefirst and second embodiments discriminates whether or not image datagenerated by the generator follows the same distribution as true imagedata under an attribute vector has been described. However, a target tobe discriminated by the neural network D serving as the discriminator isnot limited thereto. For example, the neural network D serving as thediscriminator may discriminate whether or not generated image datafollows the same distribution as the true image data. In this case, theresult of discriminating whether or not the image data includes anattribute vector may be added to the objective function in learning ofthe neural network G serving as the generator and the neural network Dserving as the discriminator. When the image data includes the attributevector, this means that an attribute (a feature) indicated by theattribute vector is included in the image data. The discrimination ofwhether or not image data includes an attribute vector may be executedby, for example, a neural network Q_(l) (l=1, 2, . . . , L) forestimating P(c₁|x) and P (c|x, p) approximation distributions.

When the result of discriminating whether or not the image data includesthe attribute vector is added to the objective function serving as theoptimization condition, for example, the objective function isrepresented by Formula (9). In learning using the objective functionrepresented by Formula (9), learning of the neural network G serving asthe generator, the neural network D serving as the discriminator, andthe neural network Q_(l) (l=1, 2, . . . , L) for estimating the P(c₁|x)and P(c|x, p) approximation distributions is performed.

$\begin{matrix}{\mspace{20mu}\lbrack {{Expression}\mspace{14mu} 14} \rbrack} & \; \\{{\min\limits_{G}{\max\limits_{D}{\min\limits_{Q_{1},\ldots\mspace{14mu},Q_{L}}{\mathcal{L}_{GAN}( {D,G} )}}}} - {\lambda_{1}{\mathcal{L}_{{{MI}/A}\; C}( {G,Q_{1}} )}} - {\sum\limits_{l = 2}^{L}{\lambda_{l}{\mathcal{L}_{HCMI}( {G,Q_{l}} )}}}} & (9)\end{matrix}$

In Formula (9), λ₁, . . . , λ_(L) are trade-off parameters. L_(GAN)(D,G) is represented by Formula (10-1). L_(MI/AC)(G, Q₁) represents thateither L_(MI)(G, Q₁) represented by Formula (10-2) or L_(AC)(G, Q₁)represented by Formula (10-3) is used. When the learning data does notinclude an attribute vector, L_(MI)(G, Q₁) is used. When the learningdata includes an attribute vector, L_(AC)(G, Q₁) is used. L_(HCMI)(G,Q₁) is represented by Formula (10-4).[Expression 15]

_(GAN)(D,G)=

_(x˜P) _(data) _((x))[log D(x)]+

_(z˜P) _(z) _((z))[log(1−D(G(z)))]  (10-1)

_(MI)(G,Q ₁)=

_(c) ₁ _(˜P(c) ₁ _(),x˜G(ĉ) _(L) ^(,z))[log Q ₁(c ₁ |x)]  (10-2)

_(AC)(G,Q ₁)=

_(c) ₁ _(˜P(c) ₁ _(),x˜G(ĉ) _(L) _(,z))[log Q ₁(c ₁ |x)]+

_(c) ₁ _(,x˜P) _(data) _((c) ₁ _(,x))[log Q ₁(c ₁ |x)]  (10-3)

_(HCMI)(G,Q ₁)=

_(c˜P(c|p),x˜G(ĉ) _(L) ^(,z))[log Q _(l)(c|x,p)]  (10-4)

In Formulas (10-1) to (10-4), x˜P_(data)(x) represents that true imagedata x is sampled from learning data. z˜P(z) represents that a latentvariable z (z_(i) and z_(a)) is generated from a given datadistribution. c₁˜P(c₁) represents that an attribute vector c₁ of a firstlayer is generated from a given data distribution. x˜G({circumflex over( )}c_(L), z) represents that image data is generated by the neuralnetwork G serving as the generator on the basis of a latent variable{circumflex over ( )}c_(L) representing each attribute in a layer L anda latent variable z (z_(i) and z_(a)). c₁, x˜P_(data)(c₁, x) representsthat true image data x and an attribute vector c₁ corresponding to theimage data x are sampled from learning data. c˜P(c|p) represents thatthe latent variable c is sampled in accordance with a distributionP(c|p). In Formula (10-4), c is a latent variable representing eachattribute of an l^(th) layer, and p is a latent variable representingeach attribute of a (l−1)^(th) layer.

In the signal change apparatus of the second embodiment, when thediscriminator discriminates whether or not the generated image datafollows the same distribution as the true image data, the learning unit30 may include a configuration having a single-layer neural networkshown in FIG. 13 instead of the configuration having a three-layerneural network shown in FIG. 9. When the learning unit 30 includes theconfiguration of the single-layer neural network shown in FIG. 13, thelearning unit 30 includes a neural network G₃ operating as a generator,a neural network D₃ operating as a discriminator, and neural networksQ₁, Q₂, and Q₃ for estimating distributions of latent variables c₁, c₂′,and c₃′ representing each attribute.

In learning of each neural network, the learning unit 30 fixes theparameters of other neural networks other than one neural network thatis a learning target and updates a parameter of the neural network ofthe learning target. The learning of each of the neural networksprovided in the learning unit 30 is iterated for each piece of learningdata as in the learning described in the first and second embodiments.

When the neural network Q₁ is learned, the learning unit 30 updates aparameter of the neural network Q₁ on the basis of the latent variablesc₂ and c₃ in which predetermined initial values are set and the latentvariables z₃ and c₁ generated from a given data distribution. When theneural network Q₂ is learned, the learning unit 30 updates a parameterof the neural network Q₂ on the basis of the latent variable c₃ in whichthe initial value is set and the latent variables z₃, c₁, and c₂generated from a given data distribution. When the neural network Q₃ islearned, the learning unit 30 updates a parameter of the neural networkQ₃ on the basis of the latent variables z₃, c₁, c₂, and c₃ generatedfrom a given data distribution.

The initial values to be set in the latent variables c₂ and c₃representing the attribute are determined on the basis of, for example,expected values or average values of values capable of being taken bythe latent variables c₂ and c₃. Alternatively, the initial values may bedetermined on the basis of the number of variables included in thelatent variables c₂ and c₃. The learning of the neural networks G₃ andD₃ is similar to the learning described in the second embodiment.

The discrimination of whether or not image data includes an attributevector may be performed by the neural network D serving as thediscriminator. When the discriminator discriminates whether or not imagedata includes an attribute vector, the discriminator may further includea neural network that determines whether or not each attribute isincluded in the input image data.

The signal change apparatuses may apply known image processingtechnology to the generated image data. For example, the signal changeapparatuses may perform super-resolution processing or correction of theimage quality on the generated image.

The signal change apparatuses and the learning apparatuses in theabove-described embodiments may be implemented by a computer. In thiscase, the signal change apparatuses and the learning apparatuses may beimplemented by recording a program for implementing their functions on acomputer-readable recording medium and causing a computer system to readand execute the program recorded on the recording medium. It is to benoted that the “computer system” described here is assumed to include anoperating system (OS) and hardware such as peripheral devices. Moreover,the “computer-readable recording medium” refers to a portable mediumsuch as a flexible disk, a magneto-optical disc, ROM, and a compact disc(CD)-ROM, and a storage apparatus such as a hard disk embedded in thecomputer system. Furthermore, the “computer-readable recording medium”may also include a computer-readable recording medium for dynamicallyholding a program for a short time as in a communication line when theprogram is transmitted via a network such as the Internet or acommunication circuit such as a telephone circuit and acomputer-readable recording medium for holding the program for a fixedtime as in a volatile memory inside the computer system serving as aserver or a client. Moreover, the program may be used to implement someof the above-described functions. The program may implement theabove-described functions in combination with a program already recordedon the computer system. The program may be implemented using aprogrammable logic device such as a field programmable gate array(FPGA).

INDUSTRIAL APPLICABILITY

The present invention can be used to, for example, change a signal suchas an image. According to the present invention, it is possible toappropriately change a signal.

DESCRIPTION OF REFERENCE SIGNS

-   1 Encoder-   2 Generator-   3 Discriminator-   10 Input unit-   20 Arithmetic unit-   30 Learning unit-   40 Neural network storage unit-   50 Prediction unit-   52 Variable extraction unit-   53 Signal output unit-   54 Signal attribute value display unit-   56 Changed attribute value acquisition unit-   58 Prediction unit-   60 Variable extraction unit-   62 Change unit-   64 Signal generation unit-   66 Changed signal output unit-   90 Output unit-   92 Attribute change screen-   94 Radio button-   96 Slide bar-   98A Image display region-   98B Reference image display region-   100 Signal change apparatus-   292 Attribute change screen-   294 Radio button-   298A, 298B Audio display region-   298C Reference audio display region-   299 Button-   392 Attribute change screen-   398A Text display region-   398B Reference text display region-   492 Attribute change screen-   498A Moving-image display region-   498B Reference moving-image display region-   499 Button-   E, D, G, Q₁, Q₂, Q₃ Neural network

The invention claimed is:
 1. A signal change apparatus comprising: asignal outputter that outputs an acquired signal; a signal attributevalue displayer that displays a value of an attribute related to anelement constituting a target represented by the acquired signal or asignal generation source in a state in which a change instruction of thevalue of the attribute is able to be received; a changed attribute valueacquirer that acquires a changed value of the attribute when the changeinstruction of the value of the attribute is received; a changer thatchanges the value of the attribute for which the change instruction hasbeen received on the basis of the changed value of the attributeacquired by the changed attribute value acquirer; a changed signaloutputter that outputs a changed signal in which the value of theattribute has been changed; and a variable extractor that extracts, fromthe acquired signal, a plurality of latent variables that include afirst latent variable representing identity of the signal and a secondlatent variable that is independent of the first latent variable andthat is a latent variable representing each attribute of the signal or alatent variable based on the latent variable, and acquires a thirdlatent variable representing each attribute of the changed signal byconverting the second latent variable using an attribute vector based onthe acquired signal, wherein the signal attribute value displayeroutputs the third latent variable, the changer changes a value of thethird latent variable on the basis of the changed value of the attributeacquired by the changed attribute value acquirer, the value of the thirdlatent variable is constrained by a value of the attribute vector, andeach of the signal outputter, the signal attribute value displayer, thechanged attribute value acquirer, the changer, the changed signaloutputter, and the variable extractor is implemented by: i) computerexecutable instructions executed by at least one processor, ii) at leastone circuit, or iii) a combination of the computer executableinstructions and the at least one circuit.
 2. The signal changeapparatus according to claim 1, wherein each of the acquired signal andthe changed signal is an image, and the attribute is an attributerelated to an element constituting a subject representing the image. 3.The signal change apparatus according to claim 1 or 2, wherein thesignal attribute value displayer displays the value of the attribute bymeans of a controller indicating the value of the attribute in the statein which the change instruction of the value of the attribute is able tobe received.
 4. A signal change apparatus comprising: a variableextractor that extracts, from an input signal, a plurality of latentvariables that include a first latent variable representing identity ofthe input signal and a second latent variable that is independent of thefirst latent variable and that is a latent variable representing eachattribute of the input signal or a latent variable based on the latentvariable, and acquires a third latent variable representing eachattribute of the input signal by converting the second latent variableusing an attribute vector based on the input signal; a changer thatchanges a value of the third latent variable acquired by the variableextractor by replacing the value of the acquired third latent variablewith a value of a latent variable representing an attribute extractedfrom a signal of a transfer source; and a signal generator thatgenerates a signal from the third latent variable changed by thechanger, wherein the value of the third latent variable is constrainedby a value of the attribute vector, and each of the variable extractor,the changer, and the signal generator is implemented by: i) computerexecutable instructions executed by at least one processor, ii) at leastone circuit, or iii) a combination of the computer executableinstructions and the at least one circuit.
 5. A signal change methodcomprising: outputting, by a signal outputter, an acquired signal;displaying, by a signal attribute value displayer, a value of anattribute related to an element constituting a target represented by theacquired signal or a signal generation source in a state in which achange instruction of the value of the attribute is able to be received;acquiring, by a changed attribute value acquirer, a changed value of theattribute when the change instruction of the value of the attribute isreceived; changing, by a changer, the value of the attribute for whichthe change instruction has been received on the basis of the changedvalue of the attribute acquired by the changed attribute value acquirer;outputting, by a changed signal outputter, a changed signal in which thevalue of the attribute has been changed; extracting, by a variableextractor, a plurality of latent variables that include a first latentvariable representing identity of the signal and a second latentvariable that is independent of the first latent variable and that is alatent variable representing each attribute of the signal or a latentvariable based on the latent variable from the acquired signal, andacquiring, by the variable extractor, a third latent variablerepresenting each attribute of the changed signal by converting thesecond latent variable using an attribute vector based on the acquiredsignal; outputting, by the signal attribute value displayer, the thirdlatent variable; and changing, by the changer, a value of the thirdlatent variable on the basis of the changed value of the attributeacquired by the changed attribute value acquirer, wherein the value ofthe third latent variable is constrained by a value of the attributevector.
 6. A signal change method comprising: extracting, by a variableextractor, a plurality of latent variables that include a first latentvariable representing identity of an input signal and a second latentvariable that is independent of the first latent variable and that is alatent variable representing each attribute of the input signal or alatent variable based on the latent variable from the input signal, andacquires a third latent variable representing each attribute of theinput signal by converting the second latent variable using an attributevector based on the input signal; outputting, by a signal attributevalue displayer, the third latent variable; changing, by a changer, avalue of the third latent variable acquired by the variable extractor onthe basis of a changed value of the third latent variable acquired by achanged attribute value acquirer; and generating, by a signal generator,a signal from the third latent variable changed by the changer, whereinthe value of the third latent variable is constrained by a value of theattribute vector.
 7. A signal change method comprising: extracting, by avariable extractor, a plurality of latent variables that include a firstlatent variable representing identity of an input signal and a secondlatent variable that is independent of the first latent variable andthat is a latent variable representing each attribute of the inputsignal or a latent variable based on the latent variable from the inputsignal, and acquires a third latent variable representing each attributeof the input signal by converting the second latent variable using anattribute vector based on the input signal; changing, by a changer, avalue of the third latent variable acquired by the variable extractor byreplacing the value of the acquired third latent variable with a valueof a latent variable representing an attribute extracted from a signalof a transfer source; and generating, by a signal generator, a signalfrom the third latent variable changed by the changer, wherein the valueof the third latent variable is constrained by a value of the attributevector.
 8. The signal change apparatus according to claim 1, wherein thechanger performs replacement on the value of the third latent variableon the basis of the changed value of the attribute acquired by thechanged attribute value acquirer, the signal change apparatus furthercomprises a signal generator that generates the changed signal using thevalue of the third latent variable after the replacement, a value of thefirst latent variable, and at least one pre-learned neural network, andthe signal generator is implemented by: i) the computer executableinstructions ii) the at least one circuit, or iii) the combination ofthe computer executable instructions and the at least one circuit.
 9. Asignal change apparatus comprising: a signal outputter that outputs anacquired signal; a signal attribute value displayer that displays avalue of an attribute related to an element constituting a targetrepresented by the acquired signal or a signal generation source in astate in which a change instruction of the value of the attribute isable to be received; a changed attribute value acquirer that acquires achanged value of the attribute when the change instruction of the valueof the attribute is received; a changer that changes the value of theattribute for which the change instruction has been received on thebasis of the changed value of the attribute acquired by the changedattribute value acquirer; a changed signal outputter that outputs achanged signal in which the value of the attribute has been changed; anda variable extractor that extracts, from the acquired signal, aplurality of latent variables that include a first latent variablerepresenting identity of the signal and a second latent variable that isindependent of the first latent variable and that is a latent variablerepresenting each attribute of the signal or a latent variable based onthe latent variable, and acquires a third latent variable representingeach attribute of the changed signal by converting the second latentvariable using an attribute vector based on the acquired signal, whereinthe signal attribute value displayer outputs the third latent variable,the changer changes a value of the third latent variable by performingreplacement on the value of the third latent variable on the basis ofthe changed value of the attribute acquired by the changed attributevalue acquirer, the signal change apparatus further comprises a signalgenerator that generates the changed signal using the value of the thirdlatent variable after the replacement, a value of the first latentvariable, and at least one pre-learned neural network, the at least oneneural network is generated by performing learning in accordance with anoptimum condition that a first neural network serving as a generatorthat generates a signal and a second neural network serving as adiscriminator that discriminates whether or not the signal generated bythe generator follows the same distribution as a true signal contendwith each other on the basis of the first latent variable representingthe identity of the signal and the third latent variable that has beenobtained by converting the second latent variable that is independent ofthe first latent variable and that is the latent variable representingeach attribute of the signal or the latent variable based on the latentvariable using the attribute vector, and each of the signal outputter,the signal attribute value displayer, the changed attribute valueacquirer, the changer, the changed signal outputter, the variableextractor, and the signal generator is implemented by: i) computerexecutable instructions executed by at least one processor, ii) at leastone circuit, or iii) a combination of the computer executableinstructions and the at least one circuit.
 10. A non-transitorycomputer-readable medium storing a program for causing a computer tofunction as the signal change apparatus according to claim
 1. 11. Anon-transitory computer-readable medium storing a program for causing acomputer to function as the signal change apparatus according to claim4.